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Auditory and somatosensory systems play a key role in speech motor control. In the act 
of speaking, segmental speech movements are programmed to reach phonemic sensory 
goals, which in turn are used to estimate actual sensory feedback in order to further control 
production. The adult's tendency to automatically imitate a number of acoustic-phonetic 
characteristics in another speaker's speech however suggests that speech production not 
only relies on the intended phonemic sensory goals and actual sensory feedback but also 
on the processing of external speech inputs. These online adaptive changes in speech 
production, or phonetic convergence effects, are thought to facilitate conversational 
exchange by contributing to setting a common perceptuo-motor ground between the 
speaker and the listener. In line with previous studies on phonetic convergence, we 
here demonstrate, in a non-interactive situation of communication, online unintentional 
and voluntary imitative changes in relevant acoustic features of acoustic vowel targets 
(fundamental and first formant frequencies) during speech production and imitation. 
In addition, perceptuo-motor recalibration processes, or after-effects, occurred not only 
after vowel production and imitation but also after auditory categorization of the 
acoustic vowel targets. Altogether, these findings demonstrate adaptive plasticity of 
phonemic sensory-motor goals and suggest that, apart from sensory-motor knowledge, 
speech production continuously draws on perceptual learning from the external speech 
environment. 



Keywords: phonetic convergence, imitation, speech production, speech perception, sensory-motor interactions, 
internal models, perceptual learning 



INTRODUCTION 

Speech production is a complex multistage motor process 
that requires phonetic encoding, initiation and coordination of 
sequences of supra-laryngeal and laryngeal movements produced 
by the combined actions of the pulmonary/respiratory system, the 
larynx and the vocal tract. Influential models of speech motor 
control postulate that auditory and somatosensory representa- 
tions also play a key role in speech production. It is proposed that 
segmental speech movements are programmed to reach phone- 
mic auditory and somatosensory goals, which in turn are used 
to estimate actual sensory inputs during speech production (for 
reviews, Perkell et al, 1997, 2000; Perrier, 2005, 2012; Guenther, 
2006; Guenther and Vladusich, 2012; Perkell, 2012). The rela- 
tionships between speech motor commands and sensory feedback 
are thought to be progressively learned by the central nervous 
system during native (and foreign) language acquisition, lead- 
ing to the establishment of mature phonemic sensory-motor 
goals. 



In adult/fluent speech production, a large number of studies 
employing manipulations of both somatosensory and auditory 
feedback also support the hypothesis that sensory feedback plays 
an important role in tuning speech motor control. For instance, 
transient transformations of both the auditory and somatosen- 
sory feedback, due to unexpected dynamical mechanical loading 
of supra-laryngeal articulators, result in on-line and rapid artic- 
ulatory adjustments in speech production (Folkins and Abbs, 
1975; Abbs and Gracco, 1984; Gracco and Abbs, 1985). Similarly, 
online modifications of the auditory feedback in its pitch (Elman, 
1981; Burnett et al., 1998; Jones and Munhall, 2000), vowel for- 
mant frequencies (Houde and Jordan, 1998; Jones and Munhall, 
2000; Houde et al., 2002; Purcell and Munhall, 2006a,b; Cai 
et al., 2011; Rochet-Capellan and Ostry, 2011, 2012) or frica- 
tive first spectral moment (Shiller et al., 2009, 2010) also induce 
compensatory changes in speech production. Finally, although 
auditory information is often assumed to be the dominant sen- 
sory modality, the integration of somatosensory information in 
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the achievement of speech movements has also been demon- 
strated (Tremblay et al., 2003; Nasir and Ostry, 2006; Feng et al., 
2011; Lametti et al., 2012). Importantly, these studies not only 
demonstrate online motor corrections to counteract the effect of 
perturbations, but also a persistence of those corrections (i.e., an 
after-effect) once the perceptual manipulation is removed (Houde 
and Jordan, 1998; Jones and Munhall, 2000; Houde et al, 2002; 
Tremblay et al, 2003; Nasir and Ostry, 2006; Purcell and Munhall, 
2006b; Shiller et al., 2009). The fact that motor compensatory 
adjustments do not disappear immediately likely reflects a global 
temporary remapping, or re-calibration, of the sensory-motor 
relationships. 

Due to the intrinsic temporal limitations of the biological 
feedback systems, the concepts of efference copy (von Hoist and 
Mittelstaedt, 1950) and internal models (Francis and Wonham, 
1976; Kawato et al., 1987) have been introduced in order to 
explain how the central nervous system rapidly reacts to pertur- 
bations and adjusts fine-grained motor parameters (Guenther, 
1995; Perkell et al., 1997; Guenther et al, 1998; Houde and Jordan, 
1998; for recent reviews, see Perkell et al, 2000; Guenther, 2006; 
Hickok et al., 2011; Houde and Nagarajan, 2011; Guenther and 
Vladusich, 2012; Hickok, 2012; Perkell, 2012; Perrier, 2012). 

During language acquisition, perceptuo-motor goals that 
define successful speech motor acts are thought to be gradually 
explored and acquired in interaction with adult speakers (Kuhl 
and Meltzoff, 1996; Kuhl et al, 1997; Kuhl, 2004). The relation- 
ships between speech motor commands and sensory feedback 
signals are then progressively learned by the central nervous sys- 
tem, and stored in the form of an internal forward model. The 
internal forward model allows for the prediction of the sensory 
consequences of speech motor movements in relation with the 
intended sensory speech goals. These internal sensory predic- 
tions, generated prior to the actual motor execution and sensory 
feedback, can assist in speech motor control. In case of discrep- 
ancy between the internal sensory predictions and the actual 
sensory feedback, corrective motor commands are estimated 
in order to further control production. Such corrective motor 
commands from the internal forward model allow refining and 
updating the relationships between the intended sensory speech 
goals and the relevant sequence of motor commands, which are 
then stored in an internal inverse model. Once the inverse model 
has been learned, it is hypothesized that speech production, 
in mature/fluent speech and in normal circumstances, operates 
almost entirely under the internal inverse model and feedfor- 
ward control mechanisms (for recent reviews, see Guenther and 
Vladusich, 2012; Perkell, 2012; Perrier, 2012). From that view, 
the intended phonemic sensory goal allows the internal inverse 
model to internally specify the relevant speech motor sequences, 
without involvement of the internal forward model and sensory 
feedback control mechanisms, thus compensating for the delay 
inherent in sensory feedback. On the other hand, sensory feed- 
back can still be used for online corrective motor adjustments, 
in case of external perturbations, in the comparison between 
internal sensory predictions from the forward model and actual 
sensory inputs. 

The above-mentioned studies and models demonstrate a key 
role of on-line auditory and somatosensory feedback control 



mechanisms in speech production and suggest that speech 
goals are defined in multi-dimensional motor, auditory and 
somatosensory spaces. However, for all their importance, these 
studies fail to reveal the extent to which speech perception 
and production systems may be truly integrated when speak- 
ing. First, individual differences in perceptual capacities may 
also act on speech production. From that view, a recent study 
on healthy adults, with no reported impairment of hearing or 
speech, demonstrates that individual differences in auditory dis- 
crimination abilities influence the degree to which speakers adapt 
to altered auditory feedback (Villacorta et al., 2007; but see 
Feng et al., 2011). Second, many studies of adaptation in speech 
production have focused primarily on the flexibility of motor pro- 
cesses, without regard for possible adaptive changes of phonemic 
sensory representations that are presumed to constitute the sen- 
sory goals of speech movements (except during language acqui- 
sition and the learning of internal models). However, two studies 
involving altered auditory or somatosensory feedback show com- 
pensatory changes not only in production of a speech sound, 
but also in its perception (Nasir and Ostry, 2009; Shiller et al., 
2009). These results thus suggest plasticity of phonemic sensory 
representations in relation to adjustment of motor commands. 
Finally, the adult's tendency to automatically imitate a number 
of acoustic-phonetic characteristics in another speaker's speech 
suggests that speech production relies not only on the intended 
phonemic sensory goals and actual sensory feedback but also on 
the processing of external speech inputs. 

In keeping with this later finding, the present study aimed at 
investigating adaptive plasticity of phonemic sensory-motor goals 
in speech production, based on either unintentional or voluntary 
vowel imitation. In addition to speech motor control, the working 
hypothesis of the present study capitalizes on previous studies on 
perceptual learning and speech imitation as well as on the theoret- 
ical proposal of a functional coupling between speech perception 
and action systems. 

In this framework, it is worthwhile noting that speech and 
vocal imitation is one of the basic mechanisms governing the 
acquisition of spoken language by children (Kuhl and Meltzoff, 
1996; Kuhl et al, 1997; Kuhl, 2004). In adults, unintentional 
speech imitation, or phonetic convergence, has been found to 
also occur in the course of a conversational interaction (for 
recent reviews, see Babel, 2009; Aubanel, 2011; Lelong, 2012). 
The behavior of each talker can evolve with respect to that of the 
other talker in two opposite directions: it may become more sim- 
ilar to the other talker's behavior (a phenomenon referred to as 
convergence) or more dissimilar. Convergence effects have been 
shown to be systematic and recurrent, and manifest themselves 
under many different forms, including posture (Shockley et al., 
2003), head movements and facial expressions (Estow et al., 2007; 
Sato and Yoshikawa, 2007) and, regarding speech, vocal intensity 
(Natale, 1975; Gentilucci and Bernardis, 2007), speech rate (Giles 
et al, 1991; Bosshardt et al, 1997), voice onset time (Flege, 1987; 
Flege and Eefting, 1987; Sancier and Fowler, 1997; Fowler et al., 
2008), fundamental frequency, and pitch curve (Gregory, 1986; 
Gregory et al, 1993; Bosshardt et al, 1997; Kappes et al., 2009; 
Babel and Bulatov, 2012), formant frequencies and spectral dis- 
tributions (Gentilucci and Cattaneo, 2005; Delvaux and Soquet, 
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2007; Gentilucci and Bernardis, 2007; Aubanel and Nguyen, 2010; 
Lelong and Bailly, 2011). Apart from directly assessing phonetic 
convergence on acoustic parameters, other studies measured con- 
vergence by means of perceptual judgments, mostly using AXB 
tests (Goldinger, 1998; Goldinger and Azuma, 2004; Pardo, 2006; 
Pardo et al., 2010; Kim et al., 2011). Importantly, phonetic con- 
vergence has been shown to manifest in a variety of ways. Some 
involve natural settings, as during conversational exchange when 
exposure to the speech of others leads to phonetic convergence 
with that speech (Natale, 1975; Pardo, 2006; Aubanel and Nguyen, 
2010; Pardo et al, 2010; Kim et al., 201 1; Lelong and Bailly, 201 1), 
or when exposure to a second language influences speech pro- 
duction of a native language, and vice-versa (Flege, 1987; Flege 
and Eefting, 1987; Sancier and Fowler, 1997; Fowler et al., 2008). 
Other involve non-interactive situations of communication, as 
when hearing and/or seeing a recorded speaker influences the 
production of similar or dissimilar speech sounds (Goldinger 
and Azuma, 2004; Gentilucci and Cattaneo, 2005; Delvaux and 
Soquet, 2007; Gentilucci and Bernardis, 2007; Kappes et al., 
2009; Babel and Bulatov, 2012). Altogether, these phenomena of 
"speech accommodation" may facilitate conversational exchange 
by contributing to setting a common ground between speakers 
(Giles et al., 1991). In that respect, they may have the same effect 
as so-called alignment mechanisms, which are assumed to apply 
to linguistic representations at different levels between partners, 
in order for these partners to have a better joint understand- 
ing of what they are talking about (Garrod and Pickering, 2004; 
Pickering and Garrod, 2004, 2007). 

Apart from social attunement, can phonetic convergence be 
also explained at a more basic sensory-motor level? In our view, 
phonetic convergence necessarily involves complex sensorimotor 
interactions that allow the speaker to compare or tune his/her 
own sensory and motor speech repertoire with the phonetic 
characteristics of the perceived utterance. Since phonetic con- 
vergence implies perception of speech sounds prior to actual 
speech production, phonetic convergence is likely to first rely 
on perceptual processing and learning from the external speech 
environment, leading to adaptive plasticity of phonemic sensory 
goals. 

From that view, a significant body of speech perception 
research has demonstrated that sensory representations of speech 
sounds are flexible in response to changes in the sensory and lin- 
guistic aspects of speech input (e.g., Ladefoged and Broadbent, 
1957; Miller and Liberman, 1979; Mann and Repp, 1980). In 
addition, studies on perceptual learning, or perceptual recali- 
bration, have provided evidence for increased performance in 
speech perception/recognition and changes in perceptual rep- 
resentations after exposure to only a few speech sounds (e.g., 
Nygaard and Pisoni, 1998; Bertelson et al., 2003; Norris et al., 
2003; Clarke and Garrett, 2004; Kraljic and Samuel, 2005, 2006, 
2007; McQueen et al, 2006; Bradlowand Bent, 2008). In addition 
to perceptual learning, it is also to note that several psycholin- 
guistic and neurobiological models of speech perception argue 
that phonetic interpretation of sensory speech inputs is deter- 
mined, or at least partly constrained, by articulatory procedural 
knowledge (Liberman et al., 1967; Liberman and Mattingly, 1985; 
Fowler, 1986; Liberman and Whalen, 2000; Schwartz et al., 2002, 



2012; Scott and Johnsrude, 2003; Callan et al., 2004; Galantucci 
et al, 2006; Wilson and Iacoboni, 2006; Skipper et al, 2007; 
Rauschecker and Scott, 2009). These models postulate that sen- 
sorimotor interactions play a key role in speech perception, with 
the motor system thought to partly constrain phonetic inter- 
pretation of the sensory inputs through the internal generation 
of candidate articulatory categories. Taken together, these stud- 
ies and models thus suggest that listeners maintain perceptual 
and motor representations that incorporate fine-grained infor- 
mation about specific speech sounds, speakers, and situations. 
Hence, during speech production, phonetic convergence may 
arise from induced plasticity of phonemic sensory and motor 
representations, in relation to relevant adjustment of motor 
commands. 

To extend the above-mentioned findings on phonetic conver- 
gence and to further test adaptive plasticity of phonemic sensory- 
motor goals in speech production, the present study aimed at 
investigating, in a non-interactive situation of communication, 
both unintentional and voluntary imitative changes in relevant 
acoustic features of acoustic vowel targets during speech produc- 
tion and imitation. A second goal of this study was to test offline 
perceptuo-motor recalibration processes (i.e., after-effects) after 
vowel production, imitation, and categorization. 

METHODS 
PARTICIPANTS 

Three groups of twenty- four healthy adults, native French speak- 
ers, participated in the production, imitation and categorization 
experiments (12 females and 12 males per group). In order to test 
possible relationships between phonetic convergence and volun- 
tary imitation, a subgroup of 12 subjects (6 females and 6 males) 
participated in both the production and imitation experiments 
(see Procedure). All participants had normal or corrected-to- 
normal vision, and reported no history of speaking, hearing or 
motor disorders. 

STIMULI 

Multiple utterances of /i/, /e/, and lei steady-state French vowels 
were individually produced from a visual orthographic target and 
recorded by six native French speakers (3 females and 3 males) in 
a sound-attenuated room. In order to cover the typical range of 
Fn values during vowel production for male and female speak- 
ers, the six speakers were selected with respect to their largely 
distinct fundamental frequency (Frj ) values during vowel produc- 
tion (see below). None of the speakers participated in the three 
experiments. 

Throughout this study, the focus was put on the main determi- 
nant of the voice characteristics that is Fq, leaving aside a number 
of other possible acoustic parameters that could also provide tar- 
gets for convergence phenomena (e.g., voice quality, Fq variations 
inside the spoken utterances, intensity, duration, etc.). In the same 
vein, the focus was comparatively put on one of the main char- 
acteristic of vowels' phonetic quality that is Fi , considering that 
in the set of unrounded front vowels here used, Fi is both the 
basic cue to distinguishing these vowels from one another (see 
for example Menard et al., 2002), and shows large variations 
from one French speaker to another (e.g., Menard et al., 2008). 
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This also leaves aside a number of other acoustic determinants of 
phonetic quality such as F2, but also F3 which is known to play 
an important role in the front unrounded region, particularly for 
l\l and to a lesser extent for lei. The choice to focus on acoustic 
variables a priori considered as the main characteristics in each 
domain seemed adequate in order to focus on major phenomena 
and escape from difficult — and largely unsolved — questions asso- 
ciated with the weighing of perceptual cues in a given perceptual 
domain. 

One token of each vowel was selected per speaker and digitized 
in an individual sound file at a sampling rate of 44.1 kHz with 16- 
bit quantization recording. Using Praat software (Boersma and 
Weenink, 2013), each vowel was scaled to 75 dB and cut, at zero 
crossing points, from the vocalic onset to 250 ms following it. 
Fq and first formant (Fi) values were then calculated for each 
vowel from a period defined as ±25 ms of the maximum peak 
intensity (see Table 1). With this procedure, the stimuli differed 
in Fo and Fi values according to both gender and speaker (mean 
F 0 averaged across vowels: 100-120-136 Hz and 196-249-296 Hz 
for the three male and the three female speakers, respectively; 
mean Fi for HI, Id, and lei vowels: 258-314-496 Hz and 285- 
414-646 Hz for the three male and the three female speakers, 
respectively). 

EXPERIMENTAL PROCEDURE 

The three experiments were carried out in a sound-proof room. 
Participants sat in front of a computer monitor at a distance of 
approximately 50 cm. The acoustic stimuli were presented at a 
comfortable sound level through a loudspeaker, with the same 
sound level set for all participants. The Presentation software 
(Neurobehavioral Systems, Albany, CA) was used to control the 
stimulus presentation during all experiments, and to record key 
responses in the categorization experiment (see below). All par- 
ticipants' productions were recorded for off-line analyses. The 
experimental design and apparatus were identical in all experi- 
ments, except the task required during the presentation of the 
acoustic stimuli (i.e., vowel production, vowel imitation and 
vowel categorization; see Figure 1). 

• Production experiment: The experiment was designed to 
test phonetic convergence on acoustically presented vowels 



Table 1 | Fo and Fi values of I'll, lei, lei target vowels according to 
the six recorded speakers (3 females/males). 



Vowel 


Gender 




Fo 






F1 








S1 


S2 


S3 


S1 


S2 


S3 


N 


Female 


210 


251 


288 


285 


269 


301 


lei 


Female 


190 


248 


290 


389 


399 


453 


hi 


Female 


187 


249 


284 


693 


577 


668 






S4 


S5 


S6 


S4 


S5 


S6 


N 


Male 


137 


120 


103 


278 


248 


247 


lei 


Male 


139 


120 


98 


390 


324 


228 


e/ 


Male 


132 


121 


100 


510 


440 


538 



and to measure the magnitude of such online automatic 
imitative changes as well as possible offline perceptuo-motor 
recalibration due to phonetic convergence (after-effects). To 
this aim, participants were instructed to produce distinct vow- 
els (HI, lei, or lei), one at a time, according to either a 
visual orthographic or an acoustic vowel target. Importantly, 
no instructions to "repeat" or to "imitate" the acoustic tar- 
gets were given to the participants. Moreover, all participants 
were naive as to the purpose of the experiment. A block 
design was used where participants produced vowels according 
first to orthographic targets (baseline), then to acoustic tar- 
gets (phonetic convergence) and finally to orthographic targets 
(after-effect). This procedure allowed comparing participants' 
productions 1) between the first presentation of the ortho- 
graphic targets and the following presentation of the acoustic 
targets in order to determine possible convergence effects on Fo 
and Fi values according to the acoustic targets and 2) between 
the first and last presentations of the orthographic targets in 
order to determine possible after-effects. 

• Imitation experiment: To compare phonetic convergence and 
voluntary imitation of the acoustic vowels, the second group of 
participants performed the same experiment except that they 
were explicitly asked to imitate the acoustic targets. The only 
indication given to the participants was to imitate the voice 
characteristics of the perceived speaker. 

• Categorization experiment: The third experiment was designed 
to test whether after-effects can occur without prior unin- 
tentional/automatic or voluntary vowel imitation but after 
auditory categorization of the acoustic targets. To this aim, par- 
ticipants were instructed to produce vowels according to the 
orthographic targets and to manually categorize the acoustic 
vowel targets, without overt production. During the catego- 
rization task, participants were instructed to produce a motor 
response by pressing with their right hand, one of three keys 
corresponding to the HI, lei, or lei vowels, respectively. 

Each experiment consisted of three experimental blocks, 
involving the acoustic targets previously recorded by either the 
three female or the three male speakers. In each block, the HI, 
lei, and lei acoustic targets were related to a single speaker. 
With this procedure, Fo values of the vowel targets remained 
similar within each block while F\ varied according to each 
vowel type. The block order (across the three speakers) was 
fully counterbalanced across participants. In each experiment, 
six female and six male participants were presented with acous- 
tic targets from the female speakers and six female and six male 
participants were presented with acoustic targets from the male 
speakers. This procedure allowed testing possible differences in 
imitative changes and after-effects depending on participant's 
and speaker's acoustic space congruency (i.e., female/female 
and male/male vs. female/male and male/female participants/ 
speakers). 

Each experimental block consisted of the orthographic pre- 
sentation of the three vowels (presented 5 times in a random 
order) then the acoustic presentation of the three vowels (ran- 
domly presented 10 times) and finally the orthographic pre- 
sentation of the three vowels (randomly presented 5 times). 
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sound isolating testing room 




visual cue: baseline 



acoustic cue 



visual cue: after-effect 



N 




N 




N 




N 



baseline 



acoustic cue 
speaker 1 



after-effect 
(baseline) 



acoustic cue 
speaker 2 



after-effect 
(baseline) 



acoustic cue 
speaker 3 



after-effect 




123456789 

productionV ■ target ■ productionA 




1 2 3 4 5 6 7 

■ target - productionV 

i productionA - productionV 



0,8 T 

0,7 

0.6 

0,5 - 

0,4 

0,3 

0,2 - 

0.1 

0,0 



o 



0,0 0,5 1,0 1,5 

convergence 



FIGURE 1 | Experimental procedure. (A) In each block, participants were 
consecutively presented with orthographic, acoustic (previously recorded 
from a single speaker) and again orthographic vowel targets. The 
experimental design and apparatus were identical in the production, imitation 
and categorization experiments, except the task required during the 
presentation of the acoustic targets (i.e., vowel production, vowel imitation, 
and manual vowel categorization). This procedure allowed determining 
unintentional/automatic in the production experiment or voluntary imitative 
changes in the imitation experiment, as well as possible after-effects in the 



three experiments. (B) The experiments consisted of three blocks involving 
different acoustic targets (/i/, lei, or /e/ vowels from 3 males or females 
speakers). (C) Left: Example of production changes (here for F^) observed for 
one participant from the presentation of the visual target (productionV) to the 
presentation of the acoustic target (productionA). Middle-Right: A correlation 
was performed on 9 median data points (3 blocks x 3 vowels) in order to 
determine a possible relationship between acoustic changes of participants' 
productions ( V-axis) and the acoustic differences between the acoustic 
targets and the baseline (X-axis; see text for details). 



Since perceptual learning from the external speech environ- 
ment likely operated throughout the experiment, the last ortho- 
graphic presentation of the vowels served as the first sub-block 
in the following experimental block. In each sub-block, each 
trial started with an orthographic or an acoustic target for 
250 ms, a blank screen for 500 ms, a fixation cue (the "+" sym- 
bol) presented in the middle of the screen for 250 ms, and 
ended with a blank screen for 2000 ms. In order to limit pos- 
sible close-shadowing effects (Porter and Lubker, 1980), partic- 
ipants were instructed to produce their response only after the 



presentation of the "+" symbol. Hence, the intertrial interval 
was 3 s. 

The total duration of each experiment was around 10 min. The 
experiments were preceded by a brief training session. A debrief- 
ing was carried out at the end of each experiment. Importantly, 
none of the participants reported having voluntarily imitated 
the acoustic stimuli in the production experiment. Note that the 
subgroup of subjects who participated in both the production 
and imitation experiments, always first performed the production 
experiment first. 
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ACOUSTIC ANALYSES 

All acoustic analyses were performed using Praat software. 
A semi-automatic procedure was first devised for segmenting 
participants' recorded vowels (8640 utterances). For each partici- 
pant, the procedure involved the automatic segmentation of each 
vowel based on an intensity and duration algorithm detection. 
Based on minimal duration and low intensity energy parame- 
ters, the algorithm automatically identified pauses between each 
vowel and set the vowel's boundaries on that basis. If necessary, 
these boundaries were hand-corrected, based on waveform and 
spectrogram information. Omissions, wrong productions and 
hesitations were manually identified and removed from the anal- 
yses. Finally, for each vowel, Fo and Fi values were calculated from 
a period defined as ±25 ms of the maximum peak intensity of the 
sound file. 

The mean percentage of errors was 2.8, 1.2, and 1.2% in the 
production, imitation, and categorization experiments, respec- 
tively, with no participant exceeding the error limit of 10%. 
For each experiment, median Fo and Fi values calculated on all 
participants' productions confirmed a standard distribution for 
the HI, lei, and lei French vowels, with differences mainly due to 
gender (see Table 2). 

RESULTS 

For each participant and each sub-block, median Fq andFi values 
were first computed for the HI, lei, and lei vowels and expressed in 
bark [i.e., arctan(0.00076/) + 3.5 arctan((//7500) 2 ); Zwicker and 
Fasti, 1990] . For each experiment, median Fq and F\ exceeding ±2 
standard deviations {SD; computed on the set of median values 
for the 24 participants) were removed from the analyses. 

PHONETIC CONVERGENCE AND VOLUNTARY IMITATION (PRODUCTION 
AND IMITATION EXPERIMENTS, SEE FIGURE 2) 

We here tested whether unintentional and voluntary imitation 
would result in shifting Fq and/or Fi toward the correspond- 
ing value for the acoustic target. To this aim, we first calculated 
acoustic changes of participants' productions between the presen- 
tation of acoustic targets and visual targets (baseline). For each 
participant and block, median Fq and Fi values produced in the 
baseline (i.e., median Fq and F\ values produced in the preceding 
sub-block during the presentation of the corresponding ortho- 
graphic targets) were subtracted from those produced during the 



Table 2 | Median Fo and F-\ values of HI, lei, lei produced vowels 
averaged over all participants' productions according to gender in 
Experiments A-C. 



Vowel Gender Experiment A Experiment B Experiment C 







Fo 


Fi 


Fo 




Fo 


Ft 


N 


Female 


225 


277 


221 


295 


222 


276 


lei 


Female 


220 


416 


216 


417 


215 


407 


Ml 


Female 


214 


613 


211 


610 


210 


607 


III 


Male 


128 


277 


124 


270 


130 


269 


lei 


Male 


125 


372 


120 


366 


126 


365 


Id 


Male 


123 


508 


119 


522 


125 


545 



presentation of each type of acoustic targets (i.e., HI, lei, or Itl). 
Next, we calculated acoustic changes between the acoustic targets 
and the baseline. These two sets of data, calculated on both Fo and 
Fi values, were then correlated in order to determine a possible 
relationship between acoustic changes of participants' produc- 
tions and the acoustic differences between the acoustic targets and 
the baseline (see Figure 1C). For each participant, one set of 9 
correlation-points (from 3 blocks and 3 vowels) was therefore cal- 
culated for both Fo and F\ and one single subject slope coefficient 
for each acoustic parameter was estimated from these values by 
means of linear regressions. In order to keep the data sets homo- 
geneous, slope coefficients exceeding ±2 SD were removed from 
the following analyses (corresponding to one participant in both 
the production and imitation experiments for Fo, and two and 
one participants in the production and imitation experiments, 
respectively, for F\, see Figure 2). 

For both Fo and F\ slope coefficients, the remaining data were 
entered into analyses of variance (ANOVA) with the experiment 
(phonetic convergence, imitation) and the acoustic space con- 
gruency (same vs. different gender of the model speaker and the 
participant) as between-subject variables. In addition, individual 
one-tailed f-tests were performed for each experiment in order 
to test whether the mean slope coefficient was significantly supe- 
rior to zero. Finally, in order to test whether imitative changes on 
Fo and F\ may correlate, a Pearson's correlation analysis was per- 
formed between single subject slope coefficients on Fo and F\ for 
each experiment. 

For Fo, ANOVA on single subject slope coefficients showed a 
significant effect of the task [F(i_ 42) = 27.16, p < 0.001], with 
stronger imitative changes according to the acoustic targets dur- 
ing the imitation task compared to the production task (mean 
slope coefficients of 0.08 and 0.48 in the production and imi- 
tation experiments). No effect of the acoustic space congruency 
[F(i, 42) = 0.05] nor task x acoustic-space congruency interac- 
tion [F(i, 42) = 0.01] were however observed. In addition, slope 
coefficients differed significantly from zero in both the produc- 
tion ]f (22 ) = 3.99, p < 0.001] and imitation ]f (22 ) = 7.11, p < 
0.001] experiments. 

For Fi , there was also a significant effect of the task [Fq 41) = 
4.95, p < 0.04], with stronger imitative changes during the imi- 
tation task compared to the production task (mean slope coef- 
ficients of 0.04 and 0.13 in the production and imitation exper- 
iments). As for Fo, no effect of the acoustic space congruency 
[F(i_ 41) = 0.24] nor task x acoustic-space congruency interac- 
tion [F(i_ 41) = 0.45] were observed. Slope coefficients also dif- 
fered significantly from zero in both the production [tpi) = 2.78, 
p < 0.02] and imitation [f(22) = 4.21, p < 0.001] experiments. 

In addition, Pearson's correlation analyses showed no signifi- 
cant correlation between single subject slope coefficients observed 
for imitative changes on Fo and Fi in both the production 
(r = 0.08, slope = 0.06) and imitation (r = 0.03, slope = 0.01) 
experiments. 

In sum, for both Fo and Fi values, these results demonstrate 
online imitative changes according to the acoustic vowel targets 
during production and imitation tasks, with stronger imitative 
changes in the voluntary vowel imitation task and a lower, albeit 
significant, phonetic convergence effect in the vowel production 
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FIGURE 2 | Imitative changes observed during vowel production (in 
green) and imitation (in blue) on F 0 (top) and F n (bottom) values. 

Left: All correlation-points based on 6480 utterances related to 
participants' imitative changes during the presentation of acoustic targets 
compared to their baseline productions (X-axis: acoustic target minus 
subject's production during the presentation of visual targets, /-axis: 
subjects' production during the presentation of acoustic targets minus 
subject's production during the presentation of visual targets; all values 



are expressed in bark). Middle: Individual slope coefficients related to 
participants' imitative changes according to the task (x-axes are subjects, 
ordered by increasing slope coefficient; slope coefficients exceeding ±2 
SD are represented in white and were removed from the analyses). 
Right: Mean slope coefficients related to participants' imitative changes 
according to the task and the acoustic space congruency (error bars 
represent standard error of the mean). "Significant effects (p < 0.05) are 
indicated. 



task. Interestingly, these effects were observed independently of 
the participant and speaker acoustic space congruency. Finally, 
it is worthwhile noting the large variability across participants, 
especially in the production task. 

AFTER-EFFECTS (PRODUCTION, IMITATION AND CATEGORIZATION 
EXPERIMENTS, SEE FIGURE 3) 

We also tested possible perceptuo-motor recalibration, i.e., after- 
effects, compared to the participant's baseline. For each par- 
ticipant, block and vowel, median Fo and F\ values produced 
during the preceding baseline were subtracted from those pro- 
duced during the second presentation of the orthographic targets. 
As previously, for each participant, one set of 9 correlation-points 
(from 3 blocks and 3 vowels) were therefore calculated for both 
Fa and F\ (see Figure 3) and single subject slope coefficients were 
estimated from these values by means of linear regressions. Slope 
coefficients exceeding ±2 SD were removed from the following 
analyses (corresponding to one, two and two participants in the 
production, imitation, and categorization experiments, respec- 
tively, for Fq, and two participants in the production, imitation, 
and categorization experiments for Fi, see Figure 3). 

For both Fq and Fj slope coefficients, the remaining data 
were entered into ANOVA with the experiment (phonetic conver- 
gence, imitation, and auditory categorization) and the acoustic 
space congruency (same, different) as between-subject variables. 
In addition, individual one-tailed t-tests were performed for each 
experiment in order to test whether the mean slope coefficient 
was significantly superior to zero. As previously, in order to test 



whether after-effects on Fq and Fi may correlate, a Pearson's 
correlation analysis was performed between single subject slope 
coefficients on Fo and Fi for each experiment. 

For Fo, ANOVA on single subject slope coefficients showed 
a significant effect of the task [Fn 42) = 6.98, p < 0.005], with 
stronger after-effects related to the acoustic targets after the imi- 
tation task compared to the production and categorization tasks 
(mean slope coefficients of 0.07, 0.23, and 0.07 in the produc- 
tion, imitation, and categorization experiments). No effect of the 
acoustic space congruency [F(i_ 42) = 0.50] nor task x acoustic- 
space congruency interaction [F(i : 42) = 0.49] were however 
observed. In addition, slope coefficients differed significantly 
from zero in both the production [t(22) = 2.85, p < 0.01], imi- 
tation [£(22) = 4.92, p < 0.001] and categorization [tyi) = 3.44, 
p < 0.005] experiments. 

For Fi, no significant effect of the task [Fq 41) = 0.08], of 
the acoustic space congruency [F^ 41) = 0.11] nor interaction 
[F(i i 41) = 0.63] were observed. Slope coefficients differed sig- 
nificantly from zero in the production experiment [mean slope 
coefficient of 0.06; f(2i) = 2.59, p < 0.02] but not in the imitation 
[mean slope coefficient of 0.04; t(2i) = 1.83, p = 0.07] and cate- 
gorization [mean slope coefficient of 0.04; t(2i) = 1.88, p = 0.07] 
experiments. 

In addition, Pearson's correlation analyses showed no signifi- 
cant correlation between single subject slope coefficients observed 
for after-effects on Fo and Fi in both the production (r = —0.08, 
slope = —0.07), imitation (r = 0.10, slope = 0.06) and catego- 
rization (r = 0.11, slope = 0.18) experiments. 
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FIGURE 3 | After-effects observed after vowel production (in green), 
imitation (in blue) and manual categorization (in red) on F 0 (top) 
and F| (bottom) values. Left: All correlation-points based on 4320 
utterances related to participants' imitative changes during the 
presentation of acoustic targets compared to their baseline productions 
(X-axis: acoustic target minus subject's production during the first 
presentation of visual targets, /-axis: subjects' production during the 
second presentation of visual targets minus subject's production during 



the first presentation of visual targets; all values are expressed in bark. 
Middle: Individual slope coefficients related to participants' after-effects 
according to the task (x-axes are subjects, ordered by increasing slope 
coefficient; slope coefficients exceeding ±2 SD are represented in white 
and were removed from the analyses). Right: Mean slope coefficients 
related to participants' after-effects according to the task and the acoustic 
space congruency (error bars represent standard error of the mean). 
'Significant effects (p < 0.05) are indicated. 



Hence, for Fo, offline perceptuo-motor recalibration pro- 
cesses were observed after vowel production, imitation, and 
categorization of the acoustic targets, with a stronger after- 
effect after voluntary vowel imitation and lower, albeit signif- 
icant, after-effects after vowel production and categorization. 
Furthermore, these effects were observed independently of the 
participant and speaker acoustic space congruency. For Fi, a 
small after-effect was only observed after vowel production, 
although there was also a trend in the same direction after 
vowel imitation and categorization. Finally, as for online adap- 
tive changes, there was a large variability across participants in all 
tasks. 

RELATIONSHIPS BETWEEN IMITATIVE CHANGES AND AFTER-EFFECTS 
(PRODUCTION AND IMITATION EXPERIMENTS, SEE FIGURE 4) 

In order to test whether imitative changes and after-effects in the 
production and imitation experiments may correlate, Pearson's 
correlation analyses were performed for both Fo and F\ between 
single subject slope coefficients corresponding to the imitative 
changes and to the after-effects (see Figure 4). As previously, slope 
coefficients exceeding ±2 SD were removed from the analyses 
(corresponding to two participants in both experiments for fo, 
and four and two participants in the production and imitation 
experiments for F\ , see Figure 4). 

For Fo, the Pearson's correlation analysis showed a significant 
correlation between single subject slope coefficients observed for 
imitative changes and for after-effects in both the production (r = 
0.64, slope = 0.71, p < 0.005) and imitation (r = 0.78, slope = 
0.52, p < 0.001) experiments. 

For F\ , the Pearson's correlation analysis also showed a signifi- 
cant correlation between single subject slope coefficients observed 
for imitative changes and for after-effects in the production 



experiment (r = 0.53, slope = 0.85, p < 0.03) but not in the 
imitation experiment (r = 0.31, slope = 0.27). 

RELATIONSHIPS BETWEEN PHONETIC CONVERGENCE AND 
VOLUNTARY IMITATION (PRODUCTION AND IMITATION 
EXPERIMENTS) 

In order to test whether phonetic convergence and voluntary 
imitation in the production and imitation experiments may cor- 
relate for the subgroup of subjects who participated in both 
experiments, Pearson's correlation analyses were performed for 
both Fo and Fi between single subject slope coefficients cor- 
responding to the convergence and imitative changes in the 
two experiments (see Figure 4). As previously, slope coeffi- 
cients exceeding ±2 SD were removed from the analyses (cor- 
responding to two participants for Fo, and one for Fi, see 
Figure 4). 

The Pearson's correlation analysis showed no significant cor- 
relation between single subject slope coefficients observed for 
imitative changes in the two experiments for Fo (r = —0.24, 
slope = -0.55) andFi (r = 0.19, slope = 0.38). 

DISCUSSION 

Influential models of speech motor control postulate a key 
role for on-line auditory and somatosensory feedback control 
mechanisms in speech production and highlight the sensory- 
motor nature of speech representations. However, studies on 
phonetic convergence suggest that speech production relies not 
only on phonemic sensory goals and actual sensory feedback 
but also on the processing of external speech inputs. In line 
with these findings, the present study demonstrates, in a non- 
interactive situation of communication, both unintentional and 
voluntary imitative changes in fundamental and first formant 
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FIGURE 4 1 Correlation between imitative changes and 
after-effects after vowel production (in green) and imitation 
(in blue) on F 0 (top) and Fi (bottom) values. X-axis: 



Individual slope coefficients related to online participants' imitative 
changes, /-axis: individual slope coefficients related to participants' 
after-effects. 



frequencies of acoustic vowel targets during speech produc- 
tion and imitation tasks. Offline perceptuo-motor recalibration 
processes on fundamental frequency — and possibly, marginally, 
for first formant frequency — were also observed after vowel 
production, imitation, and categorization of the acoustic tar- 
gets. In addition, while a significant correlation was observed 
between imitative changes and after-effects in both vowel pro- 
duction and imitation tasks, no correlation occurred between 
phonetic convergence effects and voluntary imitative changes for 
the subgroup of subjects who participated in both experiments 
on vowel production and imitation. Altogether, these results 
demonstrate adaptive plasticity of phonemic sensory-motor goals 
and suggest that speech production draws on both sensory- 
motor knowledge and perceptual learning of the external speech 
environment. 

ONLINE UNINTENTIONAL AND VOLUNTARY IMITATIVE CHANGES 

Phonetic convergence effects have been initially explored dur- 
ing natural and interactive settings, notably during conversational 



exchanges between two speaking partners (Pardo, 2006; Aubanel 
and Nguyen, 2010; Pardo et al, 2010; Kim et al, 201 1; Lelong and 
Bailly, 2011). This led to the hypothesis that "speech accommo- 
dation" may facilitate conversational exchange by contributing to 
setting a common ground between speakers (Giles et al., 1991; 
see also Garrod and Pickering, 2004; Pickering and Garrod, 2004, 
2007). However, other studies conducted in a non-interactive 
laboratory setting, as when hearing and/or seeing a recorded 
speaker influences the production of similar or dissimilar speech 
sounds (Goldinger and Azuma, 2004; Gentilucci and Cattaneo, 
2005; Delvaux and Soquet, 2007; Gentilucci and Bernardis, 2007; 
Kappes et al., 2009; Babel and Bulatov, 2012), indicate that con- 
vergence mechanisms do not depend on mutual adjustments and 
social attunement only. In our view, these later studies provide 
powerful evidence that, unless hindered by higher-order socio- 
psychological factors, phonetic convergence is a highly automa- 
tized process (for a review, see Delvaux and Soquet, 2007) that 
may also be triggered by low-level sensory and motor adaptive 
processes. 
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Based on Fo and F\ acoustic analyses of a large corpus of 
recorded vowels, the present study replicates and extends pho- 
netic convergence effects previously observed on fundamen- 
tal frequency (Gregory, 1986; Gregory et al., 1993; Bosshardt 
et al, 1997; Kappes et al, 2009; Babel and Bulatov, 2012) and 
on formant frequencies and spectral distributions (Gentilucci 
and Cattaneo, 2005; Delvaux and Soquet, 2007; Gentilucci and 
Bernardis, 2007; Aubanel and Nguyen, 2010; Lelong and Bailly, 
2011). First, online imitative changes on both Fo and F\ in 
relation to the acoustic vowel targets were observed in a non- 
interactive situation of communication during the production 
task, with none of the participants reporting having voluntarily 
imitated the acoustic stimuli. Second, although previous studies 
usually involved the production of words or sentences (Goldinger, 
1998; Goldinger and Azuma, 2004; Pardo, 2006; Delvaux and 
Soquet, 2007; Kappes et al., 2009; Aubanel and Nguyen, 2010; 
Pardo et al., 2010; Kim et al, 201 1; Lelong and Bailly, 201 1; Babel 
and Bulatov, 2012), adaptive changes were here observed during 
vowel production thus minimizing lexical/semantic processing 
(for phonetic convergence effects on Fq and/or Fi during sylla- 
ble or non-word production, see Gentilucci and Cattaneo, 2005; 
Gentilucci and Bernardis, 2007; Kappes et al., 2009). Altogether, 
these findings thus suggest that phonetic convergence may also 
derive from unintentional and automatic adaptive sensory-motor 
speech mechanisms. However, it is worthwhile noting that, 
although significant at the group level, the magnitude of these 
adaptive changes was rather small (mean slope coefficients of 0.08 
and of 0.04 for Fn and F\ , respectively) and quite variable across 
participants (individual slope coefficients ranging from —0.16 to 
0.27 and from —0.14 to 0.21 for Fq and Fi, respectively). In addi- 
tion, although phonetic convergence was attested for both Fo and 
Fi , adaptive changes were twice lower for F\ . In the experiments, 
however, Fo values of the vowel targets remained similar within 
each block while F\ varied according to each vowel type. Although 
sensory-motor and convergence mechanisms are likely to differ 
for these acoustic parameters at the acoustical, biomechanical 
and neurobiological levels, it appears difficult to speculate on 
these observed differences. Finally, although gender effects have 
been previously observed on phonetic convergence (Pardo, 2006; 
Pardo et al., 2010; Babel and Bulatov, 2012), the exact nature of 
this mediation remains unclear and may depend on both specific 
experimental designs and/or "macro" social mechanisms, out of 
the scope of this study. Given the limited number of participants 
in each sub-experimental group condition (i.e., six female and six 
male participants presented with acoustic targets from the female 
speakers, and six female and six male participants presented with 
acoustic targets from the male speakers), we rather focused on 
the participant and speaker acoustic space congruency. Phonetic 
convergence was observed independently of the participant and 
speaker acoustic space congruency, a result suggesting that pho- 
netic convergence on vowels and in a non-interactive situation of 
communication is pervasive and not strongly influenced by the 
acoustic distance between the participant and the model speaker. 

To compare phonetic convergence and voluntary imitation of 
the acoustic vowels, a second group of participants performed the 
same experiment except that they were explicitly asked to imi- 
tate the acoustic targets. As expected, stronger online imitative 



changes according to the acoustic vowel targets were observed 
during voluntary imitation (mean slope coefficients of 0.48 vs. 
0.08 for Fo and 0.13 vs. 0.04 for Fi, for the imitation and pro- 
duction tasks, respectively). As in the production tasks, imitative 
changes were however quite variable across participants (indi- 
vidual slope coefficients ranging from —0.8 to 0.99 and from 
—0.24 to 0.43 for Fo and Fi, respectively). In addition, no sig- 
nificant correlation between phonetic convergence and voluntary 
imitation on both Fo and F\ were observed for the subgroup of 
subjects who participated in both the production and imitation 
task. Interestingly, although not significant, the slope coefficient 
for Fo appears nevertheless quite high (mean slope coefficients 
of —0.55). Hence, although this last result does not indicate 
any significant correlation, possible dependencies between pho- 
netic convergence and voluntary imitation have to be further 
investigated in future studies. 

PERCEPTU0-M0T0R RECALIBRATION PROCESSES 

Interestingly, previous studies showed clear evidence of post- 
exposure imitation, with experimental designs and long-lasting 
effects that preclude strategic explanations (Goldinger and 
Azuma, 2004; Pardo, 2006; Delvaux and Soquet, 2007). In these 
studies, phonetic convergence was first attested during the pro- 
duction of auditorily presented words in a non-interactive situa- 
tion of communication. Offline adaptation to the acoustic targets 
was however observed in post-tests occurring either immediately 
(Pardo, 2006; Delvaux and Soquet, 2007) or even conducted one 
week after the production task (Goldinger and Azuma, 2004; 
see also Goldinger, 1998 using a close-shadowing task). These 
latter findings suggest that long-term memory to some extent pre- 
serves detailed traces of the auditorily presented words and thus 
support episodic/exemplar theories of word processing assuming 
that paralinguistic details of a spoken word are stored together 
as a memory trace (e.g., Nygaard et al., 1994; Goldinger, 1996, 
1998; Nygaard and Pisoni, 1998), although hybrid models com- 
bining abstract phonological representations with episodic mem- 
ory traces are also consistent with these results (e.g., McQueen 
et al, 2006; Pierrehumbert, 2006). Importantly, Pardo (2006) and 
Delvaux and Soquet (2007) also propose that these observed pho- 
netic convergence and associated long-term adaptive changes may 
be at the source of gradual diachronic changes of a phonological 
system in a community. 

In line with these studies, offline perceptuo-motor recalibra- 
tion processes were here observed for Fo after vowel production, 
imitation and auditory categorization of the acoustic targets, with 
a stronger after-effect observed after voluntary vowel imitation. 
The fact that after-effects equally occurred following prior vowel 
production and perceptual categorization of the acoustic targets 
likely suggests that these effects rely on perceptual processing 
and learning from the acoustic targets, without the need for a 
specific motor learning stage. As for online imitative changes, 
these effects were observed independently of the participant and 
speaker acoustic space congruency and, although significant at 
the group level, the magnitude of these after-effects was rather 
small (mean slope coefficients of 0.07, 0.23, and 0.07 for the 
production, imitation and categorization tasks, respectively) and 
quite variable across participants (individual slope coefficients 
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ranging from —0.20 to 0.32, from —0.23 to 0.68 and from —0.12 
to 0.25 for the production, imitation and categorization tasks, 
respectively). As expected, a significant correlation between single 
subject slope coefficients for imitative changes and after-effects 
was also observed in both the production and imitation tasks. 

For F\, a small after-effect was only observed after vowel pro- 
duction (although there was a trend after vowel imitation and 
categorization), with a significant correlation between single sub- 
ject slope coefficients for imitative changes and after-effects. The 
after-effect for Fi in the production task and the trend found in 
the imitation and categorization tasks were therefore observed 
despite F\ values of the acoustic targets varying according to each 
vowel type in each block. Finally, it is also interesting to note that 
for F\ the imitation task did not provide stronger after-effects as 
compared to the other tasks. More intriguing is the very low after- 
effect observed in the imitation tasks when subjects and targets 
were of the same gender, a phenomenon for which we do have no 
clear explanation yet. 

PERCEPTUO-MOTOR LEARNING AND INTERNAL MODELS OF SPEECH 
PRODUCTION 

Altogether, our results demonstrate adaptive plasticity of phone- 
mic sensory-motor goals in a non-interactive situation of com- 
munication, without lexical/semantic processing of the acoustic 
targets. Although they appear in line with previous studies on 
phonetic convergence and do not contradict the theoretical pro- 
posal that adaptive changes in speech production facilitate con- 
versational exchanges between speaking partners, these results 
demonstrate that, in addition to social attunement and lexi- 
cal/semantic processing, convergence effects may also be triggered 
by low-level sensory and motor adaptive speech processes. From 
that point of view, future studies on phonetic convergence con- 
trasting interactive and non-interactive laboratory settings will be 
of great interest to further determine whether social interactions 
might enhance imitative changes. 

Together with previous studies on phonetic convergence and 
imitation, the observed adaptive plasticity of phonemic sensory- 
motor goals sheds an important light on speech motor control 
and internal models of speech production (for reviews, Perkell 
et al, 1997, 2000; Perrier, 2005, 2012; Guenther, 2006; Guenther 
and Vladusich, 2012; Perkell, 2012). As previously noted, these 
models postulate that auditory and somatosensory systems play a 
key role in speech motor control and that speech goals are defined 
in multi-dimensional motor, auditory, and somatosensory spaces. 
However, they mainly focus on the flexibility of motor processes, 
without regard for possible adaptive changes of phonemic sen- 
sory representations that are presumed to constitute the sensory 
goals of speech movements. Convergence and perceptuo-motor 
recalibration processes however demonstrate that speech produc- 
tion relies not only on the intended phonemic sensory goals and 
actual sensory feedback but also on the processing of external 
speech inputs. In our view, these effects are based on complex 
sensorimotor interactions, allowing the speaker to compare or 
tune his/her own sensory and motor speech repertoire with the 
phonetic characteristics of the perceived utterance, and leading 
to perceptuo-motor learning from the external speech envi- 
ronment. During speech production, phonetic convergence and 



after-effects may therefore arise from induced plasticity of phone- 
mic sensory and motor representations, in relation to relevant 
adjustment of motor commands. Convergence effects are thus of 
considerable interest since they suggest that speech motor goals 
are continuously updated in response to changes in the sensory 
and linguistic aspects of speech inputs. Hence, as also advocated 
by Perkell (2012), adaptive processes, likely to modify online, to 
a certain extent, sensory speech representations, will have to be 
taken into account in future versions of speech motor control 
models. 

From that view, there is now considerable neurobiological 
evidence that sensorimotor interactions play a key role in both 
speech perception and speech production. In line with internal 
models of speech production, modulation of neural responses 
observed within the auditory and somatosensory cortices when 
speaking are thought to reflect feedback control mechanisms 
in which predicted sensory consequences of the speech-motor 
act are compared with actual sensory input in order to further 
control production (Guenther, 2006; Tian and Poeppel, 2010; 
Hickok et al., 2011; Houde and Nagarajan, 2011; Price et al., 
2011; Guenther and Vladusich, 2012; Hickok, 2012). In addition, 
it has been suggested that motor activity during speech per- 
ception partly constrains phonetic interpretation of the sensory 
inputs through the internal generation of candidate articula- 
tory categories (Callan et al., 2004; Wilson and Iacoboni, 2006; 
Skipper et al., 2007; Poeppel et al, 2008; Rauschecker and Scott, 
2009; Hickok et al., 2011; Rauschecker, 2011). From these mod- 
els, perceptuo-motor learning and plasticity of phonemic goals 
induced by convergence and sensory-motor adaptive processes 
might depend on both a ventral and dorsal stream (Guenther, 
2006; Hickok and Poeppel, 2007; Rauschecker and Scott, 2009; 
Hickok et al., 2011; Rauschecker, 2011; Guenther and Vladusich, 
2012; Hickok, 2012; see also Grabski et al., 2013 for recent brain- 
imaging evidence that vowel production and perception both 
rely on these dorsal and ventral streams). The ventral stream 
("what") is supposed to be in charge for phonological and lexi- 
cal processing, and thought to be localized in the anterior part of 
the superior temporal gyrus/sulcus (Scott and Johnsrude, 2003; 
Rauschecker and Scott, 2009; Rauschecker, 2011) or in the pos- 
terior part of the middle temporal gyrus and superior temporal 
sulcus (Hickok and Poeppel, 2007). The dorsal stream ("how") 
would deal with sensory-motor mapping between sensory speech 
representations in the auditory temporal and somatosensory pari- 
etal cortices and articulatory representations in the ventral pre- 
motor cortex and the posterior part of the inferior frontal gyrus, 
with sensorimotor interaction converging in the supramarginal 
gyrus (Rauschecker and Scott, 2009; Rauschecker, 2011) or in 
area SPT (a brain region within the planum temporale near the 
parieto-temporal junction; Hickok and Poeppel, 2007). In line 
with the involvement of both the dorsal and ventral streams in 
imitative changes in speech production, recent studies using rep- 
etition, shadowing or voluntary imitation tasks have provided 
evidence for a neuro-functional/neuro-anatomical signature of 
speech imitation ability, mostly relying on the superior tempo- 
ral gyrus, the premotor cortex and the inferior parietal lobule 
(Peschke et al, 2009; Irwin et al, 2011; Reiterer et al, 2011; 
Mashal et al., 2012). From these findings, the neural basis of 
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low-level sensory and motor adaptive speech processes involved 
in phonetic convergence and perceptuo-motor recalibration pro- 
cesses remains to be investigated in future studies. 
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